# Suppress only UserWarning
import warnings
warnings.filterwarnings('ignore', category=UserWarning)Neural Network Foundations
The resources related are as following:
Deep Learning for Coders with Fastai and PyTorch: AI Applications Without a PhD Chapter 4
Course Notebooks:
Detect if notebook is running on Kaggle
It’s a good idea to ensure you’re running the latest version of any libraries you need. !pip install -Uqq <libraries> upgrades to the latest version of
import os
iskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')
if iskaggle:
print('Is running on Kaggle.')
!pip install -Uqq fastaiChoosing the best image model
timm
PyTorch Image Models (timm) is a wonderful library by Ross Wightman which provides state-of-the-art pre-trained computer vision models. It’s like Huggingface Transformers, but for computer vision instead of NLP (and it’s not restricted to transformers-based models)!
Ross has been kind enough to help me understand how to best take advantage of this library by identifying the top models. I’m going to share here so of what I’ve learned from him, plus some additional ideas.
The data
Ross regularly benchmarks new models as they are added to timm, and puts the results in a CSV in the project’s GitHub repo.
import pandas as pd
# Load the results data first
df_results = pd.read_csv('image_model_results/results-imagenet.csv')
df_results['merge_key'] = df_results['model'].str.split('.', n=1).str[0]
def get_data(col):
# Load the benchmark data
df_bench = pd.read_csv('image_model_results/benchmark-infer-amp-nhwc-pt240-cu124-rtx4090.csv')
df = df_bench.merge(df_results, left_on='model', right_on='merge_key', suffixes=('_bench', '_results'))
model_col_for_family = 'model_bench'
df['secs'] = 1. / df[col]
# Extract family based on the benchmark model name
df['family'] = df[model_col_for_family].str.extract('^([a-z]+?(?:v2)?)(?:\d|_|$)')
# Filter out models ending in 'gn'
df = df[~df[model_col_for_family].str.endswith('gn')]
# Update family based on conditions (use the correct model column again)
df.loc[df[model_col_for_family].str.contains('in22', na=False),'family'] = df.loc[df[model_col_for_family].str.contains('in22', na=False),'family'] + '_in22'
df.loc[df[model_col_for_family].str.contains('resnet.*d', na=False),'family'] = df.loc[df[model_col_for_family].str.contains('resnet.*d', na=False),'family'] + 'd'
# filter based on the 'family' column
if 'family' in df.columns and not df['family'].isnull().all():
df_filtered = df[df['family'].str.contains('^re[sg]netd?|beit|convnext|levit|efficient|vit|vgg|swin', na=False)]
return df_filtered
else:
print("Warning: 'family' column is missing or empty before final filtering.")
return pd.DataFrame(columns=df.columns)
df = get_data('infer_samples_per_sec')Inference results
Here’s the results for inference performance (see the last section for training performance). In this chart:
- the x axis shows how many seconds it takes to process one image (note: it’s a log scale)
- the y axis is the accuracy on Imagenet
- the size of each bubble is proportional to the size of images used in testing
- the color shows what “family” the architecture is from.
Hover your mouse over a marker to see details about the model. Double-click in the legend to display just one family. Single-click in the legend to show or hide a family.
Note: on my screen, Kaggle cuts off the family selector and some plotly functionality – to see the whole thing, collapse the table of contents on the right by clicking the little arrow to the right of “Contents”.
import plotly.express as px
w,h = 1000,800
def show_all(df, title, size):
return px.scatter(df, width=w, height=h, size=df[size]**2, title=title,
x='secs', y='top1', log_x=True, color='family', hover_name='merge_key', hover_data=[size])show_all(df, 'Inference', 'infer_img_size')Fitting a function with gradient descent
I cant express how much I enjoyed this part. Jeremy explained it very beautifully and simply that reduces the barrier for everyone. He walks us step by step toward understanding the details of NN foundations with simple and intuitive examples and visualizations. He breaks down the scary keywords and jargons with his clear explanations.
Then he explains how a neural network approximates any given function and he shows the magic of multiple ReLU.
The most fastinating part of Jeremy’s lecture was the part that he showed he he made NN in Excel. I know Excel freaks love it but I think it shows how Jeremy mastered the concept that he can explain it so beautifully and make it memorable for everyone.
Finally, I found Jeremy’s response to those who questioned him about what happens next after learning the basics and foundation of deep learning, importan and I bring it here completely.
How to recognise an owl
OK great, we’ve created a nifty little example showing that we can drawing squiggly lines that go through some points. So what?
Well… the truth is that actually drawing squiggly lines (or planes, or high-dimensional hyperplanes…) through some points is literally all that deep learning does! If your data points are, say, the RGB values of pixels in photos of owls, then you can create an owl-recogniser model by following the exact steps above.
Students often ask me at this point “OK Jeremy, but how do neural nets actually work”. But at a foundational level, there is no “step 2”. We’re done – the above steps will, given enough time and enough data, create (for example) an owl recogniser, if you feed in enough owls (and non-owls).
The devil, I guess, is in the “given enough time and enough data” part of the above sentence. There’s a lot of tweaks we can make to reduce both of these things. For instance, instead of running our calculations on a normal CPU, as we’ve done above, we could do thousands of them simultaneously by taking advantage of a GPU. We could greatly reduce the amount of computation and data needed by using a convolution instead of a matrix multiplication, which basically means skipping over a bunch of the multiplications and additions for bits that you’d guess won’t be important. We could make things much faster if, instead of starting with random parameters, we start with parameters of someone else’s model that does something similar to what we want (this is called transfer learning).